Evolutionary Equilibrium with Forward-looking Players
نویسنده
چکیده
The population game literature builds upon an implicit model of player interaction, pairwise random matching, and three behavioral postulates: noisy decisionmaking, myopic decisionmaking, and the random arrival of choice opportunities (inertia or lock-in). By now the role of noise is understood, This paper investigates the role of the player interaction model and the other two behavioral postulates by building two distinct fully intertemporal population models where players have rational expectations about the future and discount future expected payo streams. Not surprisingly, myopic play emerges inboth models as the discount rate becomes large. In one model it also arises as lock-in increases. The two models exhibit distinct myopic behavior. Specialized to coordination games, only in one model is there selection of the risk-dominant equilibrium. The most surprising result is that in neither model does patient play lead to payo -dominant equilibrium selection. Quite the contrary. If players are patient enough, the basin of attraction for the risk-dominant equilibrium state enlarges to include the entire state space. JEL Classi cation: C78 Correspondent: Professor Lawrence Blume Department of Economics Uris Hall Cornell University Ithaca, NY 14853 [email protected] Eugene Delacroix's (1828) image of Mephistopheles galloping o with Faust illustrates the consequences of impatience. This paper determines if patience has its own rewards. Version: 27 June 1995 1 1. Modelling the Evolution of Strategic Choice The canonical stochastic population model assumes a population of players who from time to time are randomly paired with each other in a strategic relationship described by a two-person game. They enter the match with a strategy choice already locked in, and are rewarded according to their choice and the choice of their opponent. Again from time to time players have opportunities to revise their choices. When a revision opportunity occurs players will best-respond to their beliefs with high probability, but not surely. These models admit several possible interpretations of the event that a player doesn't best respond, but the important fact is that this random component of play is unmodelled noise. That is, its existence is simply assumed. The four characteristics which describe all models in this class are pairwise random matching, myopia, inertia and noisy choice. Pairwise random matching describes how players in the population interact with one another. Although it is never formally modelled, the idea is that frequently players are randomly matched with other players, and these matches are the source of all payo s. From each match a player gets a payo which is determined by her strategy and the strategy of her opponent. The remaining characteristics describe individuals' choice behavior. Myopia has to do with the modelling of the term \best-response" in the preceding description. It means that players respond to the expected payo that would result from matches given the distribution of play in the current state. Inertia is the supposition that players cannot revise their strategies before each match,but instead only occasionally. And of course noisy choice describes the unmodelled random noise. Random experimentation, random utility and random replacement of players have all been used to justify the stochastic perturbation of best-response behavior. Most of the existing literature xes a myopic decision rule and a speci cation of inertia, and studies the impact upon the model's long-run behavior of changes in the characteristics of the unmodelled noise, especially as that noise is made small. In this paper I propose to study the e ects of alternatives to myopic best response, variations in the rate of strategy revision, and assumptions about the nature of player interaction alternative to pairwise random matching. The matching technology brings two players together for an instant of time, just as two billiard balls might collide while both are in motion. The payo s from the match are determined by the choices the two players are locked in to at the instant of the match. This story has generated some interest as part of a stylized model of the evolution of choice in a population of players, but describes almost no interesting economic phenomena. 1 I will 1 The Menger-Kiotaki-Wright model of monetary exchange (Menger, 1892 and Kiyotaki and Wright, 1989) and some versions of Diamond's search equilibriummodel (Diamond, 1982) are the only exceptions I have found so far, and even these models go beyond the simple formalisms of the contemporary evolutionary game theory literature. Version: 27 June 1995 2 compare this interaction model with a model in which players meet and are bound together in an ongoing relationship for a random period of time during which the players receive a continuous payo ow. During the match players may have opportunities to revise their choice of strategy. The billiard-ball model I will refer to as the discrete match model, and the alternative model I call the continuous ow model. In both interaction models I will study how inertia and impatience interact to determine the shortand long-run behavior of the stochastic process describing players strategies. For both interaction models I prove the existence of equilibrium play when players are forward-looking and have \rational expectations" about the course of future play. This equilibrium concept is the most restrictive method for introducing forward-looking behavior, and is certainly not in harmony with the evolutionary paradigm, which eschews complex rationality requirements. Nonetheless it is of interest for four reasons. First, upon any boundedly rational notion of forward-looking behavior that did not include the possibility of fully rational outcomes sits the burden of justi cation for its systematic biases. Second, it must be admitted that a rationality requirement that players understand some aggregate process is di erent from the inductive approach to rationality through common knowledge and common belief that models of bounded rationality in games are most interested in avoiding. Third, this is the way to consider forward-looking behavior without entering the interesting but tangential argument of exactly what kinds of forward-looking choice models would be most interesting to study. Forth, full rationality is particularly interesting in the models presented here because in both interaction models the consequences of patient rational play are striking. The myopia hypothesis is motivated by bounded rationality considerations. A central concern of this paper is the ways in which myopic play can emerge from the dynamic choice models suggested by the foregoing description of equilibrium choice, and what di erent form myopia takes in the discrete match and continuous ow models. The signi cance of the inertia hypothesis is two-fold. First, random matching opportunities remove certain equilibrium possibilities, such as \all play up in even periods and down in odd periods" in a coordination game. Second, inertia a ects the tradeo between present and future rewards in strategic choice. When choice is myopic this is not important; but when players are forward-looking, they will trade o the short term bene ts of playing optimally in the current choice environment against the long term cost of possibly being locked into a bad choice for the environment of the future. The characterizations of the emergence of myopic behavior in both interaction models are valid for all symmetric games and do not depend on the magnitude of the noise. The results on patient play are limited in three ways. First, I investigate only two-by-two coordination games, the benchmark strategic problem of the stochastic evolution literature. Second, the results assume that the stochastic noise is small. In other words, all results are \equilibrium selection" results. Third, I study only Markov, symmetric and \monotonic equilibria", equilibria in which all players use the same Markovian decision rule, and in Version: 27 June 1995 3 which that rule is of the form \play A if enough players are choosing A, otherwise choose B". There is reason to believe that for generic coordination games, monotonic equilibria are the only equilibria when players are patient and tremble rarely. But for the moment this claim must remain a conjecture. Both interaction models are described by three parameters: r, the intertemporal discount rate, the arrival rate of new matches, and , the arrival rate of strategy revision opportunities. As expected, myopic play arises as players become impatient, as r becomes large. More surprising is that in the discrete match model, myopic behavior emerges for any discount rate when the arrival rate of revision opportunities , is su ciently small. This includes situations wherein all players are extremely patient. The consequences of myopic play are quite di erent in the two models. While myopic play in the discrete match model a ords the usual equilibrium selection results in coordination games, there is no equilibrium selection result in the myopic continuous ow model. Both equilibria of a coordination game are stochastically stable states (and these are the only such), and the invariant distribution depends upon the distribution of choice in the event of a tremble. Although the myopic play of the two models di ers, their behavior when players are patient is identical. The surprising result here is the strong emergence of risk-dominant equilibrium selection. Just as when preferences are myopic, the risk-dominant equilibrium has the larger basin of attraction, but in the limit case of extremely patient players the basin of attraction expands to the entire state space. Thee conclusions are independent of the level of noise, and so in fact no noise is needed to get the now-conventional risk-dominant equilibrium selection result. When players are patient, decreasing the noise decreases rather than increases the expected waiting time for a transition from coordination on the risk-dominated equilibrium to coordination on the risk-dominant equilibrium. Models in the population game literature have been implemented in three ways. Canning (1990), Kandori, Mailath and Robb (1993), Young (1993) and Samuelson (1994) all implement the evolutionary story as a discrete time Markov chain. Foster and Young (1990) and Fudenberg and Harris (1992) implement this model as a Brownian motion. Blume (1993a, 1993b and 1994) implements this model as a continuous-time jump process. In particular, when all players are ex-ante identical, this formulation is a continuous-time birth-death process. Binmore, Samuelson and Vaughan (1993) use a birth-death chain approximation to analyze a model with more complicated Markovian evolution. Birth-death models have an advantage over those models in which at any date many players may revise their strategy; they are signi cantly easier to analyze. In particular, they make possible the study of the dynamic programming problems necessary to understand forward-looking behavior. Consequently I will use the formulation of evolutionary models with random matching as developed in Blume (1994). 2. The Models The two models share a common structure concerning strategy revision opportunities, Version: 27 June 1995 4 instantaneous payo s and the like. They di er only in the matching technology | how it is that matches actually generate payo s. This section rst describes the common elements. Then the matching technologies and the resulting dynamic programming problems are taken up in turn for the discrete match and continuous ow models. 2.1. Common Elements A stochastic strategy revision process is a population process which describes the distribution among a set of strategies of a population of players. The population has N players, named player 1 through player N . Given too is a payo matrix for a K K symmetric game G. Without loss of generality we will assume that all entries in G are strictly positive. Each player is labelled with a strategy. The state of the population process at time t is a vector which describes the distribution of strategies in the player population. From time to time each player has an opportunity to revise her strategy. A policy for a player is a map that assigns to each state a probability distribution over from strategies from which she would draw the strategy she would choose if give the opportunity to revise her strategy at an instant when the process is in that state. Such policies are called Markovian in the dynamic programming literature. In general, of course, one would want to allow dependence upon the entire observable history of the process. But it will become apparent that if all other players are using Markovian policies, then any one player has a Markovian optimal response. The phrase \from time to time" has a very speci c meaning. An event that happens \from time to time" is an event which happens at random intervals whose evolution is described by a Poisson alarm clock. Consider the arrival of strategy revision opportunities for player n. Associated with player n is a collection fxnlg 1 l=1 of independent random variables, each distributed exponentially with rate parameter (mean 1= ). The variable xn1 is the waiting time until the rst strategy revision opportunity, xn2 is the interarrival time between the rst and second strategy revision opportunity, and so forth. Thus the waiting time until the mth strategy revision opportunity for player n is Pm l=1 xnl. Finally, when a player choose a strategy k, that strategy may not in fact be implemented. Some other choice l may be realized instead. Let q = (q1; : : : ; qK ) be a completely mixed probability distribution on the set of strategies. Also x an strictly between 0 and 1. With probability 1 the player's choice is realized, and with probability the players new choice is chosen by a draw from the distribution q. Each player discounts the future at rate r. Given the policies of other players, the expected present discounted value of the payo stream from a policy for any player can be computed. Each player chooses a policy to maximize that expected present discounted value. A stochastic strategy revision process is an equilibrium strategy revision process if each player's policy maximizes that expected present discounted value given the policies of all other players. Version: 27 June 1995 5 2.2. The Discrete Match Model From time to time a pair of players are brought together in a match. They receive an instantaneous payo from the match which is determined by the strategy each player is employing at the time of the match and the payo matrix G. \From time to time" means the following: For each pair of players (m;n) with m < n there is a collection of independent rate =(N 1) exponentially distributed random variables fxmnlg 1 l=1, where xmnl is the interarrival time between the l 1th and lth match of players m and n. It follows from the properties of independent, distributed random variables that the interarrival time between the l 1th and lth matches of player n with anybody is exponentially distributed with rate parameter . Notice that matching cannot be independent across players (although for large N it is approximately so). This does not matter. The only independence requirements are the independence of revision opportunities and matches, and the independence of revision opportunities across players. Choose a player, say, player 1. Let KN 1 denote the set of vectors of integers in R K of the form a = fa1; : : : ; aKg where the ak are nonnegative and P k ak = N 1. Let K denote the set of probability distributions on the K pure strategies available to each player. A policy is a map : KN 1 ! K . The argument of a policy is a list of the number of player 1's N 1 fellow players who are currently playing each strategy. The stochastic process which describe the evolution of the opponents' behaviors is called the opponent process. Typically the current strategy of player 1 will a ect the evolution of opponents' play, and so the evolution of the opponent process will be contingent upon the current choice of player 1. If all players employ a common Markovian policy, the decision problem facing each player will set up as that of controlling a birth-death process. Suppose that all players other than player 1 have settled upon a common policy . The opponent process can change state only when one of the opponents has a strategy revision opportunity. Thus the only allowable changes of state are those in which one ak decreases by 1 and another al increases by 1. This corresponds to an opponent currently playing k switching to l. The rate at which this happens depends upon the current choice of player 1. Suppose player 1 is currently playing strategy m. Let ek, el and em denote the kth, lth and mth unit vector in R, respectively. Let a denote the current state and b = a ek + el denote the new state. Then the transition rate is mab = ak (1 ) (a + em ek)l + ql There are ak opponents currently choosing k, so strategy revision opportunities arrive collectively to the collection of k players at rate ak . If the state of player 1s opponent process is a, the state of the opponent process of an opponent playing k is a + em ak: a + em describes what all players are doing; then subtract o what the k-player is doing. Consequently, the term inside the parentheses describes the probability that a k-player will choose l. All transitions requiring two or more players to change strategies run at rate 0. Version: 27 June 1995 6 Notice that, given the choice of player 1, the opponent process is a multi-type birthdeath process. Player 1 can a ect the transition rates by changing strategies, although this e ect becomes negligible as the number of players grows. When player 1 plays strategy k, her opponents (aggregate) behavior is described by a multitype birth-death process on the set N 1 with transition rates = f k abga;b2 K N 1 . Let a = P b6=a k ab. This is the rate at which the opponent process changes state when player 1 plays strategy k. Given the policy choice of her opponents, player 1's optimization problem is a conventional discounted stochastic dynamic programming problem. She has nitely many controls with which to control a Markov jump process. To describe the Bellman equation still more notation is needed. When player 1 is matched, the probability that she meets an opponent playing strategy l is proportional to the number of players choosing strategy l in the current state a. Thus her expected return from a match when the opponent process is in state k and she plays strategy k is
منابع مشابه
Evolutionary Stability in the Investment-Opportunism-Retaliation Game
Synopsis: Can economically efficient outcomes be obtained and sustained in the absence of externally enforced property rights? We study the evolutionary properties of a game that exhibits two well-defined Nash equilibria: one generates an inefficient outcome while the other set generates an efficient outcome supported by the potential for retaliation. Although standard forward-looking refinemen...
متن کاملPaper No . 9805 Learning with Forward Looking Players
Experiments show different plays among identical players and change of reaction rules (how plays are adjusted after an observation) over time. These phenomena are not easily incorporated in adaptive learning models. We model a sophisticated learning model, where players hold “theories” which map information to beliefs using the knowledge of the game and rationality. For example, a player can th...
متن کاملThe competitive advantages analysis of pharmaceutical industry strategic behaviors by game theory
Game theory is the study of mathematical models and cooperation between intelligent rational decision-makers. This paper provides a flexible model to calculate pay-off matrix based on several importance factors. This model is adapted by cooperative game and developed for some competitive advantages sections in pharmaceutical industry. An optimum solution is derived by considering Nash equilibri...
متن کاملLearning dynamics in social dilemmas.
The Nash equilibrium, the main solution concept in analytical game theory, cannot make precise predictions about the outcome of repeated mixed-motive games. Nor can it tell us much about the dynamics by which a population of players moves from one equilibrium to another. These limitations, along with concerns about the cognitive demands of forward-looking rationality, have motivated efforts to ...
متن کاملThe Empirical Relevance of Simple Forward- and Backward- looking Models: A View from a Dynamic General Equilibrium Model
Recent research have provided evidence that backward-looking models fit the data well while purely forward-looking models seem to be inconsistent with data. Consequently, many recent papers in the monetary policy rule literature have used “hybrid” models, which contain both backwardand forward-looking components. In this paper, I demonstrate that a dynamic general equilibriummodel with flexible...
متن کاملA Purification Theorem for Perfect-Information Games
Kalmar [2, 1928-9] proved that Chess is strictly determined. Von Neumann-Morgenstern [5, 1944] proved the same for any finite two-person zero-sum perfect-information (PI) game. The latter result yields a minimax theorem for (finite) non-zero-sum PI games. Fix a PI, and a player, Ann. Convert this game to a two-person zero-sum game between Ann and the other players (considered as one player), in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995